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ABSTRACT 

A symmetrical  moving  weighted  average  (HWA)  for  smoothing  observational  data  which 


may  be  regarded  as  equally  spaced  measurements  of  a function  of  one  variable  has  the  form 

m 

u = ][  c.  y . , (1) 

x . *•  i x-i 
j=-m  J 

where  y is  an  observed  value,  u is  the  corresponding  smoothed  value,  and  the  c. 

x x j 

are  real  coefficients  whose  sum  is  unity,  with  c_ = c^  . This  process  does  not  yield 
smoothed  values  of  the  first  m and  the  last  m observations  unless  additional  data  are 
available.  A natural  method  is  suggested  for  extending  the  smoothing  to  the  extremities 
of  the  data. 

If  (1)  is  exact  for  polynomials  up  to  the  degree  2s  - 1,  it  can  be  written  in  the 


ux  = 11  - (-1) S «2Sq(E)]yx  , 


where  6 is  the  finite  difference  taken  centrally,  E is  defined  by  Ef(x)  * f(x  + 1)  * 


m-s 

q(E)  = l q.  E3 
j=-m+s  J 

for  some  coefficients  q^  . If  q(z)  has  no  zero  on  the  unit  circle,  there  is  a Laurent 


expansion 


[q(z)I 


1 h.  7? 

j=-»  ^ 
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convergent  in  an  annulus  containing  the  unit  circle. 

We  regard  the  overall  smoothing  process  as  a matrix-vector  operation 

u c Gy  , (2) 

where  u and  y are  vectors  of  N components  and  G is  symmetric  with  rows,  except 
for  the  first  m and  the  last  m,  that  merely  reflect  the  application  of  (1) , We  de- 
termine the  first  m and  the  last  m rows  by  taking 

G = I - KT  DK  , 


where  K is  the  matrix  of  N - s rows  and  N columns  that  transforms  a vector  into 

the  vector  of  sth  finite  differences  of  its  components,  and  D is  the  symmetric  matrix 

of  order  N - s whose  inverse  is  the  Toeplitz  matrix  T = (t^  ) *=  (t^_  J , with 

t.  . = h.  , » 
l-g  i-g 

The  same  vector  u can  be  obtained  by  a computational  short-cut.  Let  p(z)  be  the 
monic  polynomial  of  degree  m - s whose  zeros  are  those  zeros  of  q(z)  lying  within 
the  unit  circle,  and  let 


a(z)  = (z  - 1)S  p(z)  = zW  - l a,  zm  ■*  . 

3=1  ^ 

Then,  if  the  range  of  x is  from  A to  B,  recursively  calculate  fictitious  extended 
values  y^  for  x=A-l,  A-  2,  ...,A-m  by 


y*  8j  Vj  • 

Similarly,  calculate  extended  values  for  x a B + 1,  B+2,  • B+m  recursively  by 

m 

y «=  T a.  y 

* j»i  i *-a 

Finally,  apply  (1)  to  the  entire  sequence  of  observed  and  extended  values  to  obtain 
smoothed  values  for  x=A,A+l,  ...,B. 

Schoenberg  (1946)  defined  the  characteristic  function  of  (1)  as 


♦ (t)  » l c.  e 
j*-m  J 


ijt 


and  calls  an  MWA  a smoothing  formula  if 

-1  £ <Mt)  £ 1 , 

with  some  ambiguity  as  to  whether  the  inequalities  should  be  strict  for  0 < t < 2n 

It  is  shown  here  that  the  limit  lim  Gn  exists  for  all  N > 2m  if  and  only  if 

n -»  ■» 

-1  <_  4>  ( t ) < 1 for  0 < t < 2n 

If  0 £_ 4>  C t ) < 1 for  0 < t < 2ti,  (2)  is  equivalent  to  the  minimization  of 

(u  - y)T  (u  - y)  + (Ku)1  HK.u  , 

-1  T -1 

where  H = (D  - KK  ) is  positive  definite.  This  generalizes  the  Whittake-r  (1923) 
smoothing  process. 


significance  and  explanation 

The  use  of  a moving  weighted  average  of  2m  + 1 terms  to  smooth  equally  spaced  ob- 
servations of  a function  of  one  variable  does  not  yield  smoothed  values  of  the  first  m 
and  the  last  m observations,  unless  additional  data  beyond  the  range  of  the  original 
observations  are  available.  Using  Toeplitz  matrices,  Laurent  series,  and  analogies  to 
the  Whittaker  smoothing  process,  we  develop  a natural  method  of  extending  the  smoothing 
to  the  extremities  of  the  data. 
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1.  INTRODUCTION 


A time-honored  met  lod  of  smoothing  equally  spaced  observations  of  a function  of  one 
variable  to  remove  or  reduce  unwanted  irregularities  is  the  moving  weighted  average  (MWA) 
An  example  is  Spencer's  15-term  average  (Macaulay  1931;  Henderson  1938),  which  can  bn  ex- 
pressed in  the  form 


C-3y  - 6y 
x— 7 x-6 


+ 3yx-4  + 21yx-3  + 46yx-2 


+67  + 74y 

x- 1 x 


+ 67y  + 46y  _ + 21y  + 3y  . ~ 5y  _ - 6y  , - 3y  _) 

x+1  x+2  *x+3  Jx+4  x+5  *x+6  *x+7 


where  y^  is  the  observed  value  corresponding  to  the  argument  x,  and  u^  is  the  cor- 
responding adjusted  value.  Actuarial  writers  commonly  refer  to  such  smoothing  of  data  as 
"graduation. " 

More  generally  (Schoenberg  1946)  a symmetrical  MWA  is  of  the  form 


u 

x 


J c.  y 
. L 3 x-3 


(1.2) 


\ where  m is  a given  positive  integer  and  the  real  coefficients  c^  are  such  that  c ^ = c . 


I Cj  - 1 
3=-m  J 


Such  averages  have  a long  history,  that  includes  some  eminent  names,  but  the  literature 
concerning  them  is  little  known  in  the  general  mathematical  community.  Among  the  early 
writers  on  the  subject  was  the  Italian  astronomer  G.  V.  Schiaparelli  (1866) , who  is  chiefly 
remembered  for  his  observations  of  the  planet  Mars.  The  majority  of  publications  in  this 
area  have  appeared  in  English  and  Scottish  actuarial  journals  starting  with  John  Finlaison 
in  1829  (see  Maclean  1913) . Probably  the  first  writer  to  make  a systematic  investigation 
of  such  averages  was  the  American  mathematician  E.  L.  De  Forest  (1873,  1875,  1876,  1877)  . 
His  work,  published  in  obscure  places,  was  rescued  from  total  oblivion  largely  through  the 
efforts  of  Hugh  H.  Wolfenden  (1892-1968) , who  also  made  important  contributions  to  the 
subject  (Wolfenden  1925) . E.  T.  Whittaker  (1923)  suggested  an  alternative  method  of 
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smoothing,  which  has  been  widely  employed,  specially  by  actuaries,  and  will  be  referred  to 
extensively  later,  because  of  numerov  ..ns  its  to  the  MWfl  procedure.  The  first  writer  to 
apply  sophisticated  mathematical  tools  to  the  study  of  these  averages  was  I.  J.  Schoenberg 
(1946,  1958,  1953),  who  introduced  the  notion  of  the  chai acteristic  function  of  an  MWA, 
and  utilized  it  to  formulate  a criterion  for  judging  whether  a given  average  can  properly 
be  called  a "smoothing  formula."  This  crittiou  will  be  discussed  in  Section  10. 

2.  THE  PROBLEM  OF  SMOOTHING  NEAR  1'HE  EXTREMITIES  OF  THE  DATA 

When  MWA's  have  been  used  by  actuaries,  t no  argument  x is  usually  age  (of  a person)  in 
completed  years.  When  they  are  used  for  smoothing  economic  time  series,  x denotes  the 
position  of  a particular  observation  in  a time  sequence.  The  latter  area  of  application 
appears  to  stem  largely  from  the  work  of  Frederick  R.  Macaulay  (1931),  who  was  the  son  of 
an  actuary. 

In  either  case,  a serious  disadvantage  of  the  method  is  that  it  does  not  produce  ad- 
justed values  for  arguments  too  near  the  extremities  of  the  data.  For  example,  suppose 
Spencer’s  15-term  average  is  used  to  smooth  monthly  data  extending  from  1970  through  1976. 

The  formula  does  not  give  smoothed  values  for  the  first  7 months  of  1970  or  the  last  7 
months  of  1976  unless  data  can  be  obtained  for  the  last  7 months  of  1969  and  the  first  7 
months  of  1977.  Clearly,  acquisition  of  data  extending  farther  into  the  past  is  less  of  a 
problem  than  acquisition  of  future  data. 

Actuaries  in  North  America  seem  to  have  largely  abandoned  the  use  of  MWA's  in  favor 
of  Whittaker's  method,  which  does  not  have  the  disadvantage  described.  It  is  likely  that 
British  actuaries  may  still  use  these  averages  to  some  extent.  They  appear  to  be  currently 
employed  by  economic  and  demographic  statisticians  (Shiskin,  Young,  and  Musgrave  1967). 

Various  suggestions  have  been  made  (De  Forest  1877,  Miller  1946,  Greville  1957, 

1974a)  for  dealing  with'  the  problem  of  adjustment  of  data  near  the  extremities,  but  none 
of  them  have  won  general  acceptance  De  Forest's  (1877,  p.  110)  suggestion  is  so  relevant 
to  the  subject  of  the  present  paper  that  it  is  worth  quoting  in  full: 

"As  the  first  m and  the  last  m terms  of  the  series  cannot  bo  reached  directly  by 
the  formula  (of  2m  + 1 terms] , the  series  should  be  graphically  extended  by  m terms 
at  both  ends,  first  plotting  the  observations  on  paper  as  ordinates,  and  then  extending 


I 


, . w.  (I.  liam.f. 
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the  curve  along  what  seems  to  be  its  probable  course,  and  measuring  the  ordinates  of  the 
extended  portions.  It  is  not  necessary  that  this  extension  should  coincide  with  what  would 
be  the  true  course  of  the  curve  in  those  parts.  The  important  part  is  that  the  m terms 
thus  added,  taken  together  with  the  m + 1 adjacent  given  terms,  should  follow  a curve 
whose  form  is  approximately  algebraic  and  of  a degree  not  higher  than  the  third." 

Elsewhere  (Greville  1974a)  I have  proposed  extrapolating  the  observed  data  by  fitting 
a least-squares  cubic  to  the  first  m + 1 values  and  a similar  cubic  to  the  last  m + 1 
observations.  This  is  very  much  in  the  spirit  of  De  Forest's  suggestion;  it  is  not  a long 
step  from  graphic  to  algebraic  extrapolation. 

Another  approach  (Greville  1957)  regards  the  adjustment  process  as  a matrix-vector 
operation.  We  write 

u = Gy  , 

where  y is  the  vector  of  observed  values,  u is  the  corresponding  vector  of  adjusted 
values,  and  G is  a square  matrix.  If  a specified  symmetrical  MWA  of  2m  + 1 terms  is 
to  be  used  wherever  possible,  then  the  nonzero  elements  of  G,  except  for  the  first  m 
and  the  last  m rows,  are  merely  the  weights  in  the  moving  average,  these  weights  moving 
to  the  right  as  one  proceeds  down  the  rows  of  the  matrix.  In  the  first  m and  the  last 
m rows  special  unsymmetrical  weights,  determined  in  some  appropriate  manner,  must  be  in- 
serted. The  matrix  approach  and  the  extrapolation  are  not  wholly  unrelated,  since  the  final 
results  of  the  extrapolation  approach  can  be  expressed  in  matrix  form. 

It  is  the  purpose  of  the  present  paper  to  show  that  when  a given  MWA  is  being  employed 
there  is  a natural,  preferred  method  of  extending  the  adjustment  to  the  extremities  of  the 
data,  strongly  suggested  by  the  mathematical  properties  of  the  weighted  average.  This 
natural  method  of  extension  seems  to  have  eluded  previous  writers  on  the  subject,  as  indeed 
it  eluded  me  during  the  many  years  I have  thought  about  the  matter.  The  preferred  method 
of  extension  has  the  interesting  property  that  it  can  be  arrived  at  either  through  the 
matrix  approach  or  the  extrapolation  approach.  In  the  latter  case,  one  must  employ  a 
special  extrapolation  formula  uniquely  determined  by  the  given  MWA.  Though  the  two  ap- 
proaches appear  to  be  quite  different,  they  will  be  shown  in  Section  9 to  be  mathematically 
equivalent,  and  they  will  give  identical  results  except  for  rounding  error. 


In  my  own  thinking  I arrived  at  the  procedure  first  through  the  matrix  approach, 
guided  largely  hy  extensive  analogies  to  the  Whittaker  process  (which  is  most  conventiently 
expressed  in  matrix  terms) . It  was  only  later  that  I became  aware  that  identical  results 
could  be  obtained  by  means  of  an  extrapolation  algorithm.  Though  the  matrix  approach  pro- 
vides far  greater  insight  into  the  rationale  behind  the  procedure,  the  extrapolation  ap- 
proach is  simpler  computationally . Therefore,  we  shall  first  describe  and  illustrate  the 
extrapolation  algorithm,  and  shall  then  motivate  and  justify  the  procedure  by  means  of  the 
matrix  approach. 

The  extrapolation  approach  is  merely  a computational  short  cut,  and  nearly  always  the 
extended  values  obtained  by  its  use  are  highly  unrealistic  if  regarded  as  extrapolated 
values  of  the  function  under  observation.  This  fact  is  irrelevant,  but  has  seriously 
"turned  off"  some  users.  Hereafter  I shall  therefore  avoid  the  use  of  the  words  "extrap- 
olate" and  "extrapolation,"  and  shall  speak  of  "extension,"  "extended  values,"  and  "inter- 
mediate values." 

It  is  emphasized  that  the  procedure  to  be  described  (or  any  other  procedure  for  com- 
pleting the  graduation)  is  recommended  for  use  only  when  additional  data  extending  beyond 
the  range  of  the  original  data  are  not  available, 

3.  THE  EXTENSION  ALGORITHM 


A weighted  average  of  the  form  (1.2)  will  be  called  exact  for  the  degree  r if  it  has 

the  property  that,  in  case  all  the  observed  v.lues  y . in  (1.2)  should  happen  to  be  the 

x~l 

corresponding  ordinates  of  some  polynomial  P(x  - j)  of  degree  r or  less,  then 


u «=  y 
x x 


P(x) 


(3.1) 


In  other  words,  an  average  that  is  exact  for  the  degree  r reproduces  without  change  poly- 
nomials of  degree  r or  less.  If  the  weights  are  symmetrical,  r must  be  odd,  and  we  may 
write  r = 2s  - 1 . This  implies  that  r < 2m  + 1,  and  therefore  s m . 

Eor  a simple  (unweighted)  average,  r = 1 . For  the  overwhelming  majority  of  MWA’s 
used  in  practice,  r = 3 . The  preference  for  cubics  has  a long  history.  De  Forest  (1873, 
p.  281)  suggests  that  "a  curve  of  the  third  degree,  which  admits  a point  of  inflexion  ... 
is  ...  better  adapted  than  the  common  parabola  to  represent  the  form  of  a series  whose 
second  difference  changes  its  sign." 
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We  shall  use  the  notation  of  the  calculus  of  finite  differences,  wherein  E is  he 
"displacement  operator"  defined  by 

Ef  (x)  = f (x  + 1)  , 

and  5 is  the  "central  difference"  operator  defined  by 

6f(x)  « f(x  + j)  - £(x  - j)  , 

so  that 

<52  f(x)  = f (x  + 1)  - 2f(x)  + f(x  - 1)  . 

If  the  weighted  average  (1.2)  is  exact  for  the  degree  2s  - 1,  it  can  be  written  in  the 
form 

ux  = [1  - (-1)S  62s  q(E)]yx  , (3.2) 

where  q(E)  is  of  the  form 

m-s 

q(E)  = l q E3  (3.3) 

j=-m+s  ^ 

with  q = q_.  . In  a typical  smoothing  formula  q(E)  has  only  positive  coefficients,  but 
this  is  not  necessarily  the  case.  If  q(z)  is  multiplied  by  zm  S to  eliminate  negative 
exponents,  the  resulting  polynomial  is  of  degree  2m  - 2s  . Because  of  the  symmetry  of  the 
coefficients,  it  is  a reciprocal  polynomial.  In  other  words,  if  r is  a zero  of  the  poly- 
nomial, it  follows  that  r * is  a zero.  We  shall  make  the  assumption  that  this  polynomial 
has  no  zero  on  the  unit  circle.  If  it  does  have  such  zeros,  the  extension  of  the  smoothing 
process  to  the  extremities  of  the  data  is  undefined. 

Let  p(z)  denote  the  polynomial  of  degree  m-s  with  leading  coefficient  unity 

m-s 

whose  zeros  are  the  m-s  zeros  of  z q(z)  located  within  the  unit  circle.  In  gen- 
eral, some  or  all  of  these  zeros  are  complex,  but  they  must  occur  in  conjugate  pairs,  so 
that  p(z)  has  real  coefficients.  Now  we  define  a polynomial  a(z)  of  degree  m and  its 

coefficients  a.  by 
3 

m 

a(z)  = (z  - 1)S  p(z)  = zm  - £ a.  zm  3 . (3.4) 

j=l  3 

Suppose  the  given  data  consist  of  N = B - A + 1 given  values  extending  from  x *=  A 
to  x = B . We  assume  that  N > 2m  4 1,  so  that  at  least  one  smoothed  value  is  obtained 
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by  direct  application  of  the  given  MWA.  Then  we  obtain  m intermediate  values  to  the  left 
of  x = A by  successive  application  of  the  recurrence 

m 

y *=  I a.  y . . (3.5) 

x j»l  3 x+3 

Similarly,  m intermediate  values  to  the  right  of  x = B will  be  obtained  by  the  analogous 

recurrence 


m 

l 

j=l 


*1  Vj 


Finally,  application  of  the  symmetrical  MWA  of  2m  + 1 terms  to  the  N + 2m  observed 
and  intermediate  values  gives  adjusted  values  u^  for  x = A,  A + 1,  . . . , B . 

For  example,  Spencer's  15-term  formula  (1.1)  can  be  expressed  in  the' form  (3.2)  with 

s = 2,  where 

q(E)  = (3E~5  + 18E'4  + 59E_3  + 137E_2  + 242E_1  + 318  + 242E  + 137E2 

+ 59E3  + 18E4  + 3E5)  . 

Using  a computer  program  to  find  the  zeros  of  z^  q(z),  constructing  the  polynomial  P(z)  , 
and  finally  applying  the  formula  (3.4),  we  obtain  for  Spencer's  15-term  formula 

a (z)  = z-  - .961572Z6  - .372752z5  - .015904z4  + .123488z3  + .125229Z2 

+ .075887z  + .025624  . 


The  coefficients  are  rounded  to  the  nearest  sixth  decimal  place,  except  that  the  final 

3 2 

digits  of  the  coefficients  of  z and  z have  been  adjusted  by  one  unit  to  make  the  sum 
of  the  coefficients  exactly  zero. 

Note  that  in  the  trivial  case  s = m,  q(z)  is  a constant  and  p(z)  is  unity.  Thus 
the  algorithm  reduces  to  extrapolation  of  the  observed  data  by  sth  differences  (i.e.,  by 
fitting  a polynomial  of  degree  s - 1 to  the  first  s observations). 

As  a numerical  illustration,  Spencer's  15-term  average  has  been  applied  to  some 
meteorological  data.  Table  1 and  Figure  A show  the  observed  and  graduated  values  of  monthly 
precipitation  in  Madison,  Wisconsin  in  the  years  1967-71.  No  adjustment  has  been  made  for 
the  unequal  length  of  the  months. 


-6- 


1 


Monthly  Precipitation  (inches),  Madison,  Wisconsin,  1967-71 


Year  and  Month 

Observed 

Value 

Graduated 

Value 

Year  and  Month 

Observed 

Value 

Graduated 

Value 

1967  January 

1.63 

1.11 

1969  July 

4.28 

3.81 

February 

1.17 

1.63 

August 

0.96 

3.17 

March 

1.49 

2.24 

September 

1.35 

2.33 

April 

2.57 

2,88 

October 

2.65 

1.56 

May 

3.53 

3.42 

November 

0.70 

1.06 

June 

6 .46 

3.74 

December 

1.66 

0.82 

July 

2.51 

3.85 

1970  January 

0,44 

0.90 

August 

?.71 

3.75 

February 

0.16 

1.25 

September 

2.68 

3.42 

March 

1.17 

1.78 

October 

5.52 

2.92 

April 

2.53 

2.39 

November 

1.83 

2.31 

May 

6.09 

2.94 

December 

1.89 

1.69 

June 

2.26 

3.3? 

1968  January 

0.56 

1.31 

July 

2.42 

3.63 

February 

0.49 

1.36 

August 

0.97 

3.69 

March 

0.59 

1.87 

September 

8.82 

3.50 

April 

4.18 

2.69 

October 

2.65 

3.20 

May 

2.02 

3.49 

November 

1.06 

2.74 

June 

7.82 

3.91 

December 

2.12 

2.28 

July 

2.54 

3.92 

1971  January 

1.48 

1.94 

August 

2.58 

3.54 

February 

2.59 

1.76 

September 

4.45 

2.97 

March 

1.52 

1.74 

October 

0.85 

2.45 

April 

2.42 

1.81 

N crvember 

1.74 

1.99 

May 

0.98 

1.93 

December 

2.89 

1.64 

June 

2.27 

2.02 

1969  January 

2.26 

1.56 

July 

1.65 

2.13 

February 

0.18 

1.81 

August 

3.96 

2.24 

March 

1.47 

2.35 

September 

1.87 

2.40 

April 

2.72 

3.13 

October 

1.30 

2.63 

Kay 

3.45 

3.81 

November 

3.48 

2.84 

June 

7.96 

4.0 5 

December 

3.64 

3.28 

SOURCEi  Observed  values  from  U,  S,  Department  of  Commerce,  National 
Oceanic  and  Atmospheric  Administration,  Environmental  Data  Service, 
Local  Climatological  Data.  Annual  Summary  with  Comparative  Data. 
Madison.  Wisconsin.  1972.  National  Climatic  Center,  Asheville,  N,  C., 

1973. 
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For  the  convenience  of  the  user,  the  weighted-average  coefficients  and  the  inter- 
.nediate-value  coefficients  for  those  averages  that  appear  to  be  in  common  use  or  are  found 
in  publications  accessible  to  me  are  given  in  the  next  section  in  Tables  2 and  3.  The 
reader  who  is  more  interested  in  the  justification  of  the  procedure  and  the  rationale  be- 
hind it  may  skip  at  once  to  Section  5. 

4.  TABLES  OF  MOVING-AVERAGE  AND  EXTENSION  COEFFICIENTS 

Tables  2 and  3 show  the  coefficients  in  the  MWA  and  the  corresponding  extension  co- 
efficients (that  is,  Cj  and  a_.)  for  21  weighted  averages  that  have  appeared  in  the  liter- 
ature. Table  2 is  devoted  to  the  class  of  averages  known  to  actuaries  as  minimum-R^  formu- 
las and  to  economic  statisticians  as  "Henderson's  ideal"  formulas.  They  are  discussed 
more  fully  in  Section  7.  The  values  in  Table  2 are  shown  to  six  decimal  places.  In  both 
instances,  a few  final  digits  have  been  adjusted  by  one  unit  to  make  the  sum  exactly  unity. 
The  moving-average  coefficients  are  given  to  the  nearest  sixth  decimal  place  except  for  the 
slight  adjustments  mentioned;  rounding  error  in  the  computation  of  the  extension  coeffici- 
ents may  have  introduced  further  small  errors  in  some  instances. 

Table  3 is  concerned  with  11  moving  averages  derived  by  various  writers  on  an  ad  hoc 
basis  and  known  by  the  names  of  their  originators.  The  source  notes  for  this  table  do  not 
attempt  to  cite  the  earliest  publication  of  the  formula  in  question,  but  merely  indicate 
a convenient  reference  where  it  can  be  found.  All  these  averages  are  exact  for  cubics  ex- 
cept Hardy’s,  which  is  exact  only  for  linear  functions.  The  coefficients  in  the  averages 
of  Table  3 are  rational  fractions  with  relatively  small  denominators,  and  the  user  will 
probably  find  it  convenient  to  use  as  weights  the  integers  in  the  numerators  of  the  co- 
efficients, dividing  by  the  common  denominator  as  the  final  step.  The  column  headings, 
therefore,  are  c_.  multiplied  by  the  common  denominator. 

In  both  Tables  2 and  3 advantage  has  been  taken  of  the  symmetry  of  the  coefficients 
0^  to  reduce  the  length  of  the  columns  by  approximately  one-half.  The  manner  of  using 
the  tables  may  be  illustrated  by  taking  Spencer's  15-term  average  as  an  example.  Equation 
(1.1)  shows  the  calculation  of  the  moving  averages.  The  intermediate  values  y^  for 
x *=  A - 1 to  A-7  are  calculated  successively  by  the  formula 
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yx  - ,961572yx+1  ♦ .372752yx+2  ♦ .01S904yx+3  - .123488Vx+4  - .1252297^ 


- .075887yx+6  - .025624Vx+7  . 

The  intermediate  values  for  x - B 1 to  B 7 are  calculated  by  the  identical  formula 
except  that  the  signs  in  the  subscripts  are  changed  to  signs. 

The  extension  procedure  drastically  reduces  the  number  of  values  that  need  to  be 
tabulated  for  a given  weighted  average,  and  maXes  it  possible,  for  example,  to  give  com- 
plete information  about  21  such  averages  in  the  reasonably  compact  Tables  2 and  3.  How- 
ever, the  user  who  intends  to  apply  a single  weighted  average  to  many  data  sets  may  prefer 
to  tabulate  the  atypical  elements  of  the  smoothing  matrix  G for  that  weighted  average, 
and  so  avoid  the  extra  step  of  calculating  the  intermediate  values.  For  the  benefit  of 
6uch  users,  a method  of  calculating  the  atypical  rows  of  G will  now  be  described.  Justi- 
fication of  the  procedure  will  be  given  in  Section  9 (see  equation  (9.10)).  We  observe 
that  the  nonzero  elements  in  each  row  of  G except  the  first  m and  the  last  m rows 
are  merely  the  coefficients  c..  of  the  MWA  centered  about  the  diagonal  element.  The 
elements  in  the  first  m rows  of  G,  except  for  the  first  m columns,  follow  from  the 
symmetry  of  G,  and  if  G ■ (g^) , we  have 


i j 


c . . 

3-1 


This  leaves  only  the  square  submatrix  of  order  m in  the  upper  left  corner  to  be  calcu- 
lated. Let  c denote  the  constant  -q_  /p  , where  -p  is  the  term  free  of  £ in 

tzi-s  jn-s  m-s 

the  polynomial  p(z),  and  let  Ax  = (a_)  denote  the  square  matrix  of  order  m given  by 

0 for  i > j 

1 for  i = j 

-a . , for  i < j 

3~i 


ij 


Then  the  required  submatrix  in  the  upper  left  corner  of  G is  given  by 

T 

1 - ca3  a3  , 

The  similar  submatrix  in  the  lower  right  corner  of  G contains  the  same  elements,  but 
with  the  order  of  both  rows  and  columns  reversed. 
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2.  Moving-Average  Coefficients  (cj)  and  Extension 
Coefficients  (sj)  of  Mlniwie-R^  ("Henderson's  Ideal") 
Averages  of  5 to  23  Teres  Exact  for  Cublcs 


6 .559440  .412588  .331140  .277944  .240058 


1 ,293706  2 .293706  1.618034  .266557  1.352613  .238693  1.160811  .214337  I.OI6301 

2 -.073426  -1  .058741  -.236068  .118470  ,114696  .141268  .281079  .147356  .360880 

3 -.058741  -.381966  -.009873  -.287231  .035723  -.140968  .065492  -.021625 

4 -.040724  -.180078  -.026792  -.204545  0 -.160909 


5 -.027864  -.096377  -.027864  -.13833° 

6 -.019350  -.056317 


2.  Moving-Average  Coefficients  (c^)  and  Extension 
Coefficients  (a^)  of  Mlnimua-R^  ("Henderson's  Ideal") 
Averages  of  5 to  23  Terms  Exact  for  Cublcs  (continued) 


Humber  of  Terms 


0 .211542 


.189232 


.171266 


.156470 


.144060 


1 .193742  .903661  .176390  .813444  .161691  .739580  .149136  .678000  .138319  .625880 

2 .145904  .397295  .141112  .410885  .134965  .412090  .128423  .406495  .121949  .397207 

3 .082918  .064751  .092293  .124932  .096658  .166162  .097956  .193174  .097395  .212501 

4 . 024028  -.100710  . 042093  -.043456  .054685  . 005097  .063038  .046016  .068303  .075236 

5 -.014134  -.135445  . 00246?  -.110644  . 017474  -.078255  . 029628  -.046290  . 038933  -.015313 

6 -.024499  -.094424  -.018640  -.106213  -.008155  -.099972  .003119  -.084020  .013430  -.063927 

7 -.013730  -.035128  -.020370  -.065896  -.018972  -.081843  -.012896  -.084711  -.004948  -.078737 

8 -.009961  -.023052  -.016601  -.047103  -.017614  -.063086  -.014527  -.070064 

9 -.007378  -.015756  -.013455  -.034444  -.01568?  -.048977 

10  -.005570  -.011134  -.010918  -.025714 

11  -.004278  -.008092 


^Calculated  by  formula  (7.5). 


3.  Moving -Average  Coefficients  (cj)  and  Extension 
Coefficients  (aj)  of  Selected  Moving  Averages 


Macaulay8, 

Spencer 

15-Termb 

Voolhouse0  Hardy* 

Hlgham*  Karupf 

J 864c  ^ a^ 

320c  1 

125c  1 a1  120c  1 a1 

125c  1 a1  625c  1 a1 

0 

182 

74 

25 

24 

25 

125 

1 

171 

,919760 

67 

.961572 

24 

.885108 

22 

.739988 

24 

.859550 

114 

.820240 

2 

127 

.393023 

46 

.372752 

21 

.421982 

17 

.386211 

18 

.399283 

87 

.402924 

3 

72 

.055273 

21 

.015904 

7 

.028721 

10 

.124325 

10 

.087040 

53 

.114622 

4 

17 

-.113111 

3 

-.123488 

3 

-.076050 

4 

-.023648 

3 

-.072738 

21 

-.047133 

5 

-17 

-.140462 

-5 

-.125229 

0 

-.107285 

0 

-.080087 

0 

-.104527 

0 

-.102491 

6 

-19 

-.084512 

-6 

-.075887 

-2 

-.092723 

-2 

-.079459 

-2 

-.093953 

-8 

-.091791 

7 

-10 

-.029971 

-3 

-.025624 

-3 

-.059753 

-2 

-.049327 

-2 

-.055312 

-9 

-.060239 

8 

-1 

-.018003 

-1 

-.019343 

-6 

-.028636 

9 

-2 

-.007496 

aMacaulay  1931,  P.  55.  footnote  2. 
bMacaulay  1931  , P.  55 1 Henderson  1938,  p.  53. 

Henderson  1938,  p.  53. 

dHenderson  1938,  p.  53l  Benjamin  and  Haycocks  1970,  p.  238. 
®Henderson  1938,  p.  53. 
f Henderson  1938,  p.  53. 
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3.  Moving-Average  Coefficients  (cj)  and  Extension 
Coefficients  (a^)  of  Selected  Moving  Averages  (continued) 


Andrews8 

Spencer. 

21-Term 

Hardy  , Vaughan  . 

Wave-Cutting  Formula  A^ 

Kenchlngton 

J 10080c  1 a 1 

350c  ^ 

65c  ^ a ^ 1440c  ^ a ^ 

385=, 

0 

1688 

60 

5 

182 

45 

1 

1579 

.700747 

57 

.729724 

5 

.480996 

179 

.593256 

44 

.527740 

2 

1325 

.406808 

47 

.40870? 

6 

.368708 

170 

.396409 

41 

.370688 

3 

950 

.179749 

33 

.167281 

7 

.267940 

149 

.230238 

36 

.236445 

4 

551 

.027155 

18 

.009255 

7 

.166506 

115 

.096761 

30 

.128638 

5 

225 

-.054586 

6 

-.069703 

6 

.072964 

72 

-.000857 

22 

.043118 

6 

-4 

-.083701 

-2 

-.091513 

4 

-.008222 

29 

-.060076 

13 

-.018390 

7 

-124 

-.078256 

-5 

-.076165 

1 

-.075454 

-5 

-.083321 

5 

-.053902 

8 

-135 

-.054368 

-5 

-.049051 

-1 

-.097387 

-26 

-.079596 

-1 

-.067080 

9 

-110 

-.031120 

-3 

-.022502 

-2 

-.089039 

-29 

-.056662 

-5 

-.064844 

10 

-61 

-.012428 

-1 

-.006033 

-2 

-.062016 

-19 

-.028557 

-6 

-.050323 

11 

-1 

-.024996 

-6 

-.007595 

-5 

-.032035 

12  -3  -.015626 
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-1  -.004429 


^Andrews  and  Nesbitt  1965»  P.  18. 

^Macaulay  1931»  P.  51 I Henderson  1938,  p.  53. 
^Benjamin  and  Haycocks  1970,  p.  239. 

^Vaughan  1933,  P.  *07. 

^Henderson  1938,  p,  53. 


5.  THE  WHITTAKER  GRADUATION  PROCESS 


It  is  not  the  purpose  of  this  paper  to  consider  the  Whittaker  (1923;  see  also 
Henderson  1924)  graduation  process  in  detail.  However,  since  the  natural  method  of  ex- 
tension of  MWA  graduation  to  the  extremities  of  the  data  was  arrived  at  primarily  on  the 
basis  of  an-  aies  to  the  Whittaker  method,  the  latter  must  be  described  sufficiently  to 
make  these  t ,ies  clear.  The  objective  of  the  Whittaker  process  is  to  choose  graduated 
values  Uj  ( = A,  A + 1,  . . . , B)  in  such  a way  as  to  minimize  the  quantity 


B jj-s 

l - y.)2  + g l (As  u )2  , 

j=A  3 3 3 j=A  3 


(5.1) 


where  the  weights  W^,  the  positive  constant  g,  and  the  positive  integer  s are  chosen 
a priori  by  the  user.  The  solution  is  most  conveniently  expressed  in  matrix  notation  as 
follows  (Greville  1957,  1974a).  Let  W denote  the  diagonal  matrix  of  order  N whose 
successive  diagonal  elements  are  the  W^,  let  u and  y be  defined  as  in  Section  2,  and 
let  K denote  the  rectangular  matrix  of  N - s rows  and  N columns  that  transforms  a 
vector  v into  the  vector  of  sth  finite  differences  of  its  components.  Clearly  the  non- 
zero elements  of  K are  binomial  coefficients  of  order  s with  alternating  signs  (Greville 
1974a).  Then,  the  expression  (5.1)  can  be  written  in  the  form 


(u  - y)T  W(u  - y)  + g(Ku)T  Ku  , (5.2) 

where  the  superscript  T denotes  the  transpose.  It  is  easily  seen  (Greville  1774a)  that 

(5.2)  is  smallest  when  u satisfies 

(W  + gKT  K) u = Wy  . (5.3) 

It  is  not  difficult  to  show  (Greville  1957,  1974a)  that  the  matrix  in  the  left  member  of 

(5.3)  is  nonsingular  (in  fact,  positive  definite)  and  therefore 

u *=  (W  + gKT  K)’1  Wy  . 

The  remaining  discussion  will  be  limited  to  the  so-called  "Type  A"  case,  in  which  all 
the  weights  W^  are  taken  equal  to  unity,  as  this  case  has  the  greatest  similarity  to 
MWA  graduation.  Here  W = I (the  identity),  and  it  is  easily  verified  (Noble  1969,  p.  147) 


that 


(5.4) 


T -1  T-l  T-l 

(I  + gK  K)  = I - K (g  I + KK  ) K . 

If  the  entire  process  of  graduation,  by  whatever  method  or  criterion,  including  data 
near  the  ends,  is  conceived  in  terms  of  matrix-vector  multiplication  (Greville  1957)  , so 
that 

u * Gy  (5.5) 

for  some  matrix  G,  (5.4)  suggests  that  it  may  be  reasonable  to  consider  matrices  G of 
the  form 

G = I - KT  DK  (5.6) 

for  some  square  matrix  D and  some  order  of  differences  s . 

6.  MATRIX  DEVELOPMENT  OF  THE  NATURAL  METHOD  OF  COMPLETING  THE  GRADUATION 

We  suppose  that  N equally  spaced  observed  values  y (j  = A,  A + 1,  . . . , B)  are  to 
be  graduated  primarily  by  means  of  a given  symmetrical  MWA  of  2m  + 1 terms  of  the  form 
(1.2),  that  is  exact  for  the  degree  2s  - 1 . We  assume  that  N > 2m  . In  other  words, 
graduated  values  u^  for  j=A+m,  A+m+1,  ...B-m  will  be  calculated  from  the 
given  weighted  average.  This  requirement  fixes  the  elements  of  the  matrix  G of  (5.5)  and 
(5.6)  with  the  exception  of  the  first  m and  the  last  m rows.  The  nonzero  elements  of 
each  of  the  remaining  N - 2m  rows  will  be  merely  the  weights  in  the  moving  average  with 
the  middle  weight  on  the  diagonal  in  each  case. 

Our  determination  of  the  elements  of  the  first  m and  the  last  m rows  of  G will 
be  based  on  the  general  requirement  that  these  rows  shall  not  be  something  extra  grafted 
onto  the  main  part  of  the  matrix,  but  shall  be  an  integral  part  of  an  overall  matrix  having 
a well  defined  structure,  this  structure  having  the  greatest  possible  analogy  to  that  of 
the  corresponding  matrix  for  the  Whittaker  process.  We  shall  try  to  show  that  this  general 
requirement  leads  almost  inexorably  to  the  following  three  assumptions  about  G for  the 
MWA  case: 

(i)  G is  symmetric  and  of  the  form  (5.6); 

(ii)  D is  such  that  D * exists  and  is  a Tooplitz  matrix  (to  be  defined  presently); 

(iii)  the  elements  of  D * = (dj^)  are  given  by 
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(6.1) 


d 


I 

ij 


- h 


i-j 


where  = h_^  is  a coefficient  in  the  Laurent  expansion 


h(z)  = (q(z) ) 1 *=  l h , 

js>  OO  J 


(6.2) 


convergent  in  an  annulus  containing  the  unit  circle. 

These  three  assumptions  (together  with  the  assumption  stated  in  the  first  paragraph 
of  this  section  about  the  rows  of  G other  than  the  first  m and  the  last  m)  uniquely 
determine  G . The  three  assumptions  require  extensive  discussion,  explanation,  and  com- 
ment, on  which  we  now  embark. 

Since  analogy  to  the  Whittaker  process  is  to  have  the  highest  priority,  and  (5.4) 
shows  that  G for  that  process  is  clearly  symmetric  and  of  the  form  (5.6),  these  being 
very  basic  structural  properties,  there  can  be  little  question  about  assumption  (i)  . This 
assumption  implies  that  G is  a diagonal  band  matrix  of  band  width  2m  + 1,  and  its  ele- 
ments arc  now  determined  except  for  a square  submatrix  of  order  m in  the  upper  left 
corner  and  a similar  submatrix  in  the  lower  right  corner.  It  also  implies  that  D is 
symmetric  and  is  a diagonal  band  matrix  of  band  width  2m  - 2s  -t  1 . 

It  may  be  mentioned  here  that  there  is  one  basic,  unavoidable  difference  between  the 
Whittaker  process  and  the  MWA  process.  This  is  that,  while  in  the  MWA  process  (with  the 
natural  extension)  G is  a diagonal  band  matrix,  in  the  Whittaker  process  it  is  the  in- 
verse of  such  a matrix.  In  consequence  of  this  difference,  the  Whittaker  process  is 
"global”  (each  graduated  value  depending  on  all  the  observed  values) , while  MWA  is  "local" 
(each  graduated  value  depending  only  on  a few  neighboring  observed  values) . This  distinc- 
tion carries  over  to  the  related  matrix  D,  which,  in  the  Whittaker  process,  is  not  a 
diagonal  band  matrix  but  the  inverse  of  such  a matrix  (of  band  width  2s  + 1);  from  (5.4)  , 

D_1  = g"1  I + KXT  . (6.3) 


Assumption  (i)  fixes  the  elements  of  D except  for  those  in  a square  submatrix  of 
order  m -s  in  the  upper  left  corner  and  a similar  submatrix  in  the  lower  right  corner. 
Reverting  to  the  expression  q(E)  of  (3.2)  and  (3.3),  we  have  D = (d„)  , with 
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'ij 


•i-3 


(6.4) 


except  within  the  two  submatrices  mentioned. 

We  define  a Toeplitz  matrix  (see  Trench  1974)  as  one  in  which  all  the  elements  on  any 
diagonal  line  extending  downward  and  to  the  right  are  equal.  In  other  words,  T = (t  ) 
ia  a Toeplitz  matrix  when 


t 


ij 


t 


i-j 


for  all  i and  j . 

It  is  easily  verified  that  D 1 for  the  Whittaker  process,  given  by  (6.3),  is  a 
Toeplitz  matrix.  In  fact,  if  D 1 = (d(^)  ' 


*1,  - * o'1 


6.  . 

13 


where  5. . is  a Kronecker  symbol. 

13 

Now,  it  is  clear  that  the  Toeplitz  property  is  a very  striking  and  obvious  property 
of  those  matrices  which  possess  it.  Thus,  in  pursuit  of  our  goal  of  maximum  analogy  be- 
tween the  Whittaker  and  MWA  processes,  we  would  wish,  if  at  all  possible,  to  make  D 1 a 
Toeplitz  matrix  in  the  MWA  case.  Accordingly,  let  D 1 = (d!  .)  with  d!  . = d!  . for  all 
i and  j . Since  D is  symmetric,  D is  symmetric  and  dV  = d!  . Consider  the 
series 

CO 

f(z)  = l d\  7?  (6,5) 

j=-  «>  ^ 

(which  may  or  may  not  converge).  Because  of  (6.4)  this  series  is  a "reciprocal"  of  q(z) 
at  least  in  the  formal  sense  that  if  q(z)  and  f(z)  are  formally  multiplied  together, 
the  product  is  unity. 

The  latter  fact  does  not  uniquely  determine  the  series  (6.5).  In  order  to  achieve 
a unique  determination,  we  invoke  a further  analogy  between  the  MWA  and  Whittaker  processes. 
We  require  that  in  the  MWA  case  this  series  converge  in  some  region  of  the  complex  plane. 

The  corresponding  series  for  the  Whittaker  case  is  finite,  and  therefore  converges  every- 
where. 

Now,  a Laurent  series  like  (6.5),  if  it  converges  anywhere,  converges  in  an  annulus. 
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Because  of  the  symmetry  of  the  coefficients,  if  it  converges  for  z = z^,  it  converges 
for  z ■ Zq  • Therefore,  the  annulus  of  convergence,  if  it  exists,  contains  the  unit 
circle.  Moreover,  (q(z)]  1 has  a Laurent  expansion  (6.2)  convergent  in  an  ar  lulus  con- 
taining the  unit  circle  if  and  only  if  q(z)  has  no  zero  on  the  unit  circle. 

Thus,  assumption  (iii)  is  the  only  possible  assumption  consistent  with  assumptions 
(i)  and  (ii)  that  satisfies  the  requirement  that  (6.5)  converge  in  some  part  of  the  plane, 
and  assumption  (iii)  implies  that  q(z)  has  no  zeros  on  the  unit  circle.  The  prohibition 
against  such  zeros  of  q(z)  was  previously  alluded  to  in  Section  3,  and  further  reasons 
'for  insisting  on  it  will  be  given  in  Section  10. 

In  reality,  the  part  of  assumption  (ii)  that  asserts  the  nonsingularity  of  D is 
redundant,  because  it  is  shown  in  Section  9 that  if  a Toeplitz  matrix  T(=  D *)  is  con- 
structed in  accordance  with  assumption  (iii) , then  the  square  suhmatrices  of  order  m - s 
in  the  upper  left  and  lower  right  corners  of  D can  be  chosen  so  that  DT  = I . 

In  the  typical  case  D is  a matrix  of  nonnegative  elements  (this  is  true  in  the 
Whittaker  case),  but  this  is  not  a requirement.  (It  is  not  true  of  Hardy's  formula.) 

The  matrix-vector  formulation  does  not  le&d  at  once  to  a convenient  method  for  cal- 
culating the  graduated  values  near  the  end.  of  the  data.  It  will  be  shown  in  Section  9 to 
be  equivalent  to  the  extension  algorithm  described  in  Section  3,  and  also  to  the  method  of 
calculating  the  atypical  elements  of  G described  in  Section  4. 

7.  SPECIAL  CLASSES  OF  MOVING  AVERAGES 


Of  particular  interest  are  those  moving  averages  known  to  actuaries  as  minimum-R3 
formulas  and  to  economic  statisticians  as  "Henderson's  ideal"  formulas.  For  a given  number 
of  terms  2m  + 1,  this  is  the  average  (1.2),  exact  for  the. third  degree,  for  which  the 
quantity 

[ (r  c ) (7.1) 

j=-m-3  3 


is  smallest  (with  the  understanding  that  c^ 


0 for  | j | > m)  . The  "smoothing  coeffi- 
cient" R2  is  defined  as  the  quantity  obtained  by  dividing  (7.1)  by  20  and  taking  the 
square  root.  The  divisor  20  is  chosen  because  this  is  the  value  of  (7.1)  for  the  trivial 


case  of  (1.2)  in  which  cQ  * 1 and  c^  = 0 for  j / 0 


The  rationale  for  minimizing  (7.1)  may  be  explained  as  follows  (Greville  1974a).  If, 


x'  x+1'  x+2'  ~x+3 

x “ A + m to  B - m - 3,  inclusive,  then 


A3u 


m 3 

£ (A  c.)  y . , 
■ ->  3 x+3+3 


(7.2) 


It  has  been  customary  to  regard  the  smallness  (in  absolute  value)  of  the  third  differences 
of  the  graduated  values  as  an  indication  of  smoothness.  Therefore  (7.2)  suggests  that 
smoothness  is  encouraged  by  making  the  quantities  A3  c^  numerically  small,  and  minimizing 
(7.1)  is  a way  of  doing  this.  The  formula  corresponding  to  (7.2)  for  a general  order  of 
differences  is 


AS 

A u 


(-1)' 


m 

l <AS 

j=-m-s 


c .)  y , . 
g x+j+s 


(7.3) 


and  the  general  formula  for  R is 


2 

R = 
s 


m 

l 

j=-m-s 


,„s  2 2s. 

(A  c ) /(  ) 

3 5 


(7.4) 


There  is  some  question  whether  Henderson's  contribution  warrants  attaching  his  name 
to  the  "ideal"  weighted  averages.  De  Forest  (1873)  treated  extensively  the  formulas  that 
minimize  R^  . The  concept  of  choosing  the  coefficients  in  order  to  minimize  R^ 

seems  to  have  been  first  mentioned  by  G.  F.  Hardy  (1909) . These  averages  were  fully  dis- 
cussed by  Sheppard  (1913)  slightly  earlier  than  by  Henderson  (1916).  However,  Henderson 
does  seem  to  have  been  the  first  to  give  an  explicit  formula  for  the  coefficient  c^  in 
the  weighted  average  minimizing  R^  (Henderson  1916,  p.  43;  Macaulay  1931,  p.  54; 

Henderson  1938,  p.  60;  Miller  1946,  p.  71;  Greville  1974a,  p.  18).  If  we  write  k = m + 2, 
so  that  the  weighted  average  has  2k  - 3 terms,  the  formula  is 


315 [ (k  - l)2  - j2]  (k2  - j2) ( (k  + l)2  - j2] (3k2  - 16  - llj2) 
8k  (k2  - 1)  (4k2  - 1)  (4k2  - 9)  (4k2  - 25) 


(7.5) 


Weighted  averages  that  minimize  Rg  have  been  discussed  from  other  points  of  view 
by  Wolfenden  (1925) , Schoenberg  (1946) , and  Greville  (1966,  1974b) . 


Also  deserving  of  special  mention  are  the  averages  (exact  for  cubics)  that  minimize 


Rq,  sometimes  called  "formulas  of  maximum  weight"  or  "Sheppard's  ideal"  formulas.  These 
are  sometimes  applied  to  physical  measurements  when  the  errors  of  observation  can  be  re- 
garded as  random  "white  noise"  (see  discussion  of  "reduction  of  error"  in  Section  8)  . The 
weights  are  given  by 

« 3 (3m2  + 3m  - 1)  - 15 j2 

Cj  * (2m  - 1)  (2m  + 1)  (2m  + 3)  * 

Weighting  coefficients  (c^)  and  extension  coefficients  (a^)  for  minimum-R^ 
(Henderson’s  ideal)  averages  of  5,  7,  ...»  23  terms  are  given  in  Table  2. 

8.  COMPARISON  WITH  OTHER  METHODS.  PRACTICAL  CONSIDERATIONS 

If  a symmetrical  MWA  exact  for  the  degree  2s  - 1 is  being  used  to  smooth  the  main 
part  of  the  data,  it  can  easily  be  deduced,  either  from  the  extension  algorithm  described 
in  Section  3 or  from  the  matrix  formulation  of  (5.5)  and  (5.6)  that  the  unsymmetrical 
weightings  proposed  for  smoothing  the  first  m and  the  last  m observations  are  exact 
only  for  the  degree  s - 1 . For  example,  all  the  averages  represented  in  Tables  2 and  3 
with  the  exception  of  Hardy's  are  exact  for  cubics,  and  therefore  their  extensions  to 
values  near  the  ends  are  exact  only  for  linear  functions.  Hardy's  weighted  average  is  ex- 
act for  linear  functions  and  its  extension  only  for  constants. 

The  Whittaker  process  has  a similar  property.  At  a sufficient  distance  from  the 
ends  of  the  data,  polynomials  of  degree  2s  - 1 are  "almost"  reproduced  by  that  process. 
In  support  of  this  rather  loose  statement  the  following  heuristic  argument  is  advanced. 


For  the  Whittaker  process 


G = (I  + gKT  K)-1  = I - gGKT  K . 


Thus,  if  y is  the  vector  of  observed  values,  the  vector  of  corrections  to  these  values  is 

T 

- gGK  Ky  . 

T 

Now,  the  nonzero  elements  of  K K,  with  the  exception  of  the  first  s and  the  last  s 

rows,  are  binomial  coefficients  of  order  2s  with  alternating  signs.  Therefore  the  com- 
T 

ponents  of  K Ky,  except  for  the  first  s and  the  last  s,  are  (2s) th  differences  of 
those  of  y (or  their  negatives  if  s is  odd).  Thus,  if  y is  a vector  of  ordinates  of 
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a polynomial  of  degree  2s  - 1,  K Ky  is  a vector  of  zeros  except  for  the  first  s and 

T 

the  last  s components.  The  components  of  GK  Ky  are  graduated  values  of  those  of 
T 

K Ky,  and  therefore  should  be  very  small  at  some  distance  from  the  extremities  of  the 
data.  Finally,  multiplication  by  g,  even  though  g is  typically  large,  should  give 
small  corrections  at  a sufficient  distance  from  the  ends  of  the  data. 

Some  users  may  consider  the  reduction  in  degree  of  exactness  near  the  ends  of  the  data 
a disadvantage  of  the  natural  method  of  extension.  Before  I became  aware  of  the  natural 
method,  I had  proposed  (Greville  1974a)  a different  method  of  extension  (already  mentioned 
in  Section  2)  that  does  not  have  this  particular  disadvantage  (though  it  has  other  short- 
comings) . This  involves  extrapolation  by  a polynomial  of  degree  2s  - 1 fitted  by  least 
squares  to  the  first  m + 1 observations.  A similar  polynomial  is  fitted  to  the  last 
m + 1 observations  for  extrapolation  at  the  other  end  of  the  data.  There  may  be  a gain  in 
simplicity  in  using  a single  method  of  extrapolation  for  all  symmetrical  weighted  averages, 
the  particular  extrapolated  values  depending  only  on  the  number  of  terms  in  the  main  for- 
mula. However,  there  is  a loss  in  that  the  extension  method  is  no  longer  tailored  to  the 
particular  symmetrical  average  used. 

Like  the  natural  method  of  extension,  the  method  using  extrapolation  by  least  squares 
can  be  collapsed  into  a single  matrix  G . When  this  is  done,  the  diagonal  band  character 
of  the  smoothing  matrix  is  maintained,  but  the  symmetry  is  lost.  Though  the  matrix  ap- 
proach is  less  convenient  for  computational  purposes,  the  differences  between  the  two 
methods  are  best  elucidated  by  comparing  the  first  m rows  of  the  respective  matrices  G . 
This  is  done  in  Tables  4 and  5 for  the  case  of  the  9-term  "ideal"  formula.  Here  m = 4 , 
but  for  convenience  the  fifth  row  is  also  shown.  Its  elements  would  be  repeated  in  the 
subsequent  rows,  moving  successively  to  the  right,  until  we  come  to  the  last  four  rows. 

While  an  average  of  as  few  as  9 terms  would  seldom  be  used  in  practice,  this  is  a con- 
venient illustration. 

As  previously  indicated,  the  first  m rows  and  the  last  m rows  of  G may  be  re- 
garded as  exhibiting  unsymmetrical  weighted  averages  which  are  to  be  used  near  the  ends  of 
the  data  to  supplement  the  symmetrical  average  used  elsewhere.  The  coefficients  that  ap- 
pear in  the  last  m rows  are  the  same  as  those  in  the  first  m rows,  but  the  order  is 
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reversed,  both  horizontally  and  vertically.  It  should  be  noted  that  the  coefficients  in 
the  supplemental  averages  depend  only  on  those  of  the  underlying  symmetrical  average.  They 
do  not  depend  on  N,  the  number  of  observations  in  the  data  set  (which  is  the  order  of  G)  . 

The  coefficients  in  the  supplemental  weighted  averages  based  on  least-square  extrapo- 
lation, exhibited  in  Table  5,  show  two  undesirable  features.  These  are  negative  coeffi- 
cients of  substantial  numerical  magnitude,  and  successive  waves  of  positive  and  negative 
coefficients  as  one  proceeds  from  left  to  right  along  the  rows.  The  number  of  such  waves 
would  increase  as  the  number  of  terms  in  the  underlying  formula  increases. 

In  striking  contrast  is  the  character  of  the  coefficients  of  the  natural  extension. 
Like  the  coefficients  in  the  underlying  symmetrical  formula,  each  row  exhibits  a peak  in 
the  vicinity  of  the  main  diagonal  of  the  matrix,  tapering  off  to  a single  group  of  nega- 
tive coefficients  of  reduced  size  near  the  edge  of  the  diagonal  band. 

In  the  least-squares  method  only  a very  small  correction  is  made  to  the  initial  ob- 
served value.  The  corresponding  correction  in  the  natural  method  is  more  substantial. 

The  "second-difference  correction"  is  the  coefficient  of  the  second-difference  term 
when  the  formula  is  expressed  in  terms  of  increasing  orders  of  differences  in  the  form 


u 

x 


y + c A 
x 


'x-h 


+ » . . 


The  coefficient  c does  not  depend  on  the  subscript  x-h,  in  which  there  is  some  free- 
dom of  choice.  For  the  formulas  based  on  least-squares  extrapolation,  which  are  exact  for 
cubics,  the  fourth-difference  correction  is  similarly  defined. 

Some  writers  (Miller  1946,  Wolfenden  1942,  Greville  1974a)  have  regarded  the  ob- 
served values  y as  the  sum  of  "true"  values  U and  superimposed  random  errors  e 
X X x 

If  it  is  assumed  that  the  errors  e for  different  x are  uncorrelated,  and  have  zero 

x 

2 

mean  and  constant  variance  a for  all  x,  then  the  variance  of  the  error  in  the  smoothed 
2 2 2 

value  u is  R„  o , where  R.  is  obtained  by  taking  s = 0 in  (7.4).  Thus,  R may 

x 0 0 u 

be  interpreted  as  the  ratio  of  reduction  in  the  standard  deviation  of  error  that  results 
from  application  of  the  weighted  average. 

While  the  assumptions  underlying  the  preceding  analysis  may  be  questioned,  never- 
theless a good  case  can  be  made  that,  for  any  weighted  average,  R should  be  less  than 
unity.  Since  R?  is  the  sum  of  the  squares  of  the  coefficients  in  the  average,  RQ  can 
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never  be  less  than  the  maximum  of  the  absolute  values  of  the  coefficients.  Thus,  an  aver- 
age cannot  be  considered  satisfactory  if  the  absolute  value  of  any  coefficient  is  equal  to 
or  greater  than  unity. 


As  indicated  in  Section  7,  it  has  long  been  customary  to  regard  a graduation  as  smooth 
if  the  third  differences  of  the  graduated  values  are  small  in  absolute  value.  If  G = (g_), 
we  have 

N 

UA+i-l  “ ^ gij  yA+j-l  ' 

and  therefore 


u 


A+i-1 


N 

*A+j-l  Ai  gij 


(8.1) 


where  the  subscript  of  A indicates  that  the  differences  are  taken  with  respect  to  i 
(i.e.,  down  the  columns  of  the  matrix).  If  one  avoids  the  corner  submatrices,  the  nonzero 
elements  g^  in  (8.1)  are  merely  coefficients  in  the  underlying  symmetrical  average,  and 
(8.1)  reduces  to  (7.3).  This  was  the  rationale  underlying  the  derivation  of  the  minimum- 
Rg  averages. 

Of  course,  if  G is  symmetric,  it  makes  no  difference  whether  the  differences  are 
taken  horizontally  or  vertically.  When  the  symmetry  of  G is  not  assumed,  care  must  be 
exercised.  Many  years  ago  (Greville  1947,  1948)  I published  what  purported  to  be  coeffi- 
cients in  supplemental  averages  to  be  used  near  the  ends  of- the  data  in  conjunction  with 
minimum- and  minimum- R^  symmetrical  averages.  The  symmetry  of  G was  not  assumed,  and 
I made  the  error  of  deriving  the  unsymmetrical  coefficients  by  minimizing  their  third  dif- 
ferences taken  horizontally.  The  tables  in  question  are  therefore  based  on  an  incorrect 
assumption.  Further  it  may  be  mentioned  in  passing  that  in  the  1947-8  formulation  the 
diagonal  band  character  was  not  maintained,  since  the  supplemental  averages  contained  the 
full  2m  + 1 terms. 

Table  6 shows,  for  the  natural  and  the  least-squares  extensions  of  the  9- term  mini- 
mum-R3  formula,  those  third  differences  of  the  matrix  elements,  taken  vertically,  that 
involve  elements  of  the  first  five  rows.  The  entries  in  the  fifth  row  of  Table  6 would 
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be  repeated  in  subsequent  rows,  moving  successively  to  the  right.  Casual  inspection  of 
the  table  shows  that  the  third  differences  are  numerically  smaller  for  the  natural  exten- 
sion. All  of  these  third  differences  are  less  than  0.14  in  absolute  value.  Two  of  those 
for  the  least-squares  extension  exceed  0.7  in  absolute  value. 

It  is  instructive  to  compare  the  natural  extension  with  the  least-squares  extension 
for  the  numerical  example  of  Section  3.  Though  neither  extension  is  recommended  for  use 
when  additional  data  are  available  beyond  the  range  of  the  original  data  set,  nevertheless 
it  may  be  of  interest,  purely  for  purposes  of  illustration,  to  choose  a numerical  example 
in  which  such  additional  data  are  available,  and  this  has  been  done. 

Table  7 and  Figures  B and  C show,  for  the  first  seven  months  of  1967  and  the  last 

seven  months  of  1971,  the  observed  values  of  precipitation  in  Madison,  Wisconsin,  and  the 
graduated  values  obtained  by  (i)  natural  extension  of  Spencer's  15-term  average,  (ii) 
least-squares  extension  of  the  same  average,  and  (iii)  use  of  additional  data.  It  will  be 
noted  that  the  least-squares  extension  is  strongly  constrained  toward  each  of  the  two 
terminal  observations  (January  1967  and  December  1971) . This  may  be  explained  by  the  fact 
that  all  the  values  y^  in  (1.2)  that  enter  into  the  calculation  of  these  graduated  values 

are  included  in  either  the  m + 1 observations  to  which  the  least-squares  cubic  was  fitted 

or  the  m extrapolated  values  obtained  fiom  the  same  cubic.  On  the  other  hand,  the  natural 
extension  and  the  least-squares  extension  are  very  close  together  at  the  interface  with  the 
graduated  values  calculated  in  the  standard  manner.  Thus,  for  the  months  of  July  1967  and 
June  1971,  all  but  one  of  the  values  y entering  into  the  computation  (1.2)  are  identical 
for  the  two  methods. 

For  the  months  closer  to  the  interface  the  graduated  values  obtained  by  introducing 
additional  data  are  close  to  those  of  the  natural  extension.  This  is  because  the  supple- 
mental unsymmetrical  averages  produced  by  the  natural  extension  (unlike  those  of  the  least- 
squares  extension)  give  relatively  small  weight  to  the  observations  more  remote  from  the 
one  being  graduated  (as  does  the  underlying  symmetrical  formula).  For  example,  the  values 
for  the  natural  extension  and  those  obtained  by  the  use  of  additional  data  are  indistin- 
guishable in  Figure  B for  April  to  July  1967.  In  the  last  months  of  1971  the  deviation 
is  greater  because  the  first  two  months  of  1972  were  exceptionally  dry.  This  could  not 
have  been  predicted  fron  the  data  for  preceding  months. 
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Data  to  First  Seven  and  Last  Seven  Months  by  Different  Methods 


r 


Extension  of  Graduation  by 


Year  and 
Month 

Observed 

Value 

Natural 

Method 

Least-Squares 

Cubic 

Additional 

Data 

1967 

January 

1.63 

1.11 

1.62 

1.56 

February 

1.17 

1.63 

0.98 

1.84 

March 

1.49 

2.24 

1.37 

2.29 

April 

2.57 

2.88 

2.32 

2.85 

May 

3.53 

3.42 

3.07 

3.36 

June 

6.46 

3.74 

3.61 

3.70 

July 

2.51 

3.85 

3.82 

3.84 

1971 


June 

2.27 

2.02 

2,00 

2.05 

July 

1.65 

2.13 

2.03 

2.23 

August 

3.96 

2.24 

2.00 

2.39 

September 

1.87 

2.40 

1.97 

2.51 

October 

1.30 

2.63 

2.08 

2.50 

November 

3.48 

2.84 

2.58 

2.31 

December 

3.64 

3.28 

3.85 

2.04 
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B.  Observed  and  Graduated  Values  of  Monthly 
Precipitation,  Madison,  Wisconsin,  January  to 


C.  Observed  and  Graduated  Values  of  Monthly 
Precipitation,  Madison,  Wisconsin,  July  to  December,  1971 


July  Aug • SePt • 


r 


Table  8 gives  certain  parameters  for  the  various  symmetrical  weighted  averages  that 
have  been  mentioned  previously.  The  column  headed  "Error"  requires  explanation.  This  is 
the  error  committed  when  the  formula  in  question  is  used  to  "smooth"  a polynomial  of  degree 
four.  This  naturally  tends  to  increase  with  the  number  of  terms  in  the  formula.  Both  RQ 
and  tend  to  decrease  with  increasing  number  of  terms.  Though  the  "ideal"  formulas 

have  been  derived  to  minimize  R^ , they  tend  to  produce  small  values  of  RQ  as  well.  In 
only  one  instance  (Vaughan)  does  a "name"  formula  have  a smaller  R^  than  the  ideal  formula 
of  the  same  number  of  terms.  The  late  Hubert  Vaughan  was  an  unusually  keen  analyst  of  MWA 
smoothing. 

It  may  be  mentioned  in  passing  that  some  writers  (e.  g.  , Henderson  1938)  call  the  re- 
2 

ciprocal  of  RQ  the  "weight"  and  the  reciprocal  of  R^  the  (smoothing)  "power." 

9.  PROOF  OF  EQUIVALENCE  OF  THE  MATRIX  AND  INTERMEDIATE- VALUE  APPROACHES 

Though  this  proof  involves  only  elementary  mathematics,  it  is  fairly  long  and  comp- 
licated, and  is  therefore  organized  in  the  form  of  three  lemmas  and  a theorem. 

Let 


p(z)  = z -l 


p.  z 


m-s-j 


j=l 


where  p(z)  is  the  polynomial  defined  in  Section  3 whose  zeros  are  the  zeros  of  q(z) 
located  inside  the  unit  circle. 


Lemma  9.1.  The  quantities  )k  of  (6.2)  satisfy  the  recurrence 

m-s 

N " ^ Pl  hj-£  (9’1) 

3 1=1  * 3 

for  all  positive  j . 


(9.2) 


8.  Parameters  of  the  Symmetrical  Weighted  Averages  Listed  In  Tables  2 and  3 


Designation 

Number 
of  Terms 

Ro 

*3 

Error 

Mlnlaum-R^  (Henderson's  ideal) t 

5 

.7045 

.2735 

-.0736* 

7 

.5971 

.1147 

-.296* 

9 

.5323 

.0581 

-.766* 

11 

.4665 

.0331 

-1.576* 

13 

.4515 

.0204 

-2.886* 

15 

.4234 

.0134 

-4.856* 

17 

.4002 

.0095 

-7.646* 

19 

.3806 

.0066 

-11.46* 

21 

.3636 

.0048 

-16.56* 

23 

.3468 

,0036 

4 

-23.16 

Macaulay 

15 

.4273 

.01657 

k 

-4.526* 

Spencer 

15 

.4389 

.01659 

-3.866* 

Woolhouse 

15 

.4602 

.0654 

-5.46* 

Hardy 

17 

.4059 

.0105 

T§  *2  - 3-706* 

Hlgham 

17 

.4127 

.0179 

-6.46* 

Karup 

19 

.4036 

.0095 

-7.86* 

Andrews 

21 

.3707 

.00628 

-14.96* 

Spencer 

21 

.3784 

.00626 

-12.66* 

Hardy,  wave-cutting 

23 

.3332 

.0154 

•48.86 

Vaughan  A 

23 

.3415 

.0050 

-26.66* 

Kenchlngton 

27 

.3202 

.0031 

-22.46* 

(9.3) 


cp (z)  p(z  X)  h(z)  = 1 . 

Now,  Ip(z) ) has  an  expansion  in  negative  powers  of  z,  with  exponents  not  greater 
than  -m  + s,  whose  region  of  convergence  contains  the  unit  circle.  Call  it  b(z)  . Then, 

cp(z  *)  h (z)  = b(z)  , 


from  which  it  follows  that  (9.1)  holds  for  all  positive  j,  and  the  proof  is  complete. 

Let  denote  the  (unknown)  square  submatrix  of  order  m - s in  the  upper  left 

corner  of  D . Let  P = (p_)  be  a matrix  of  m - s rows  and  2m  - 2s  columns  defined 
by 


ij 


-P3-i 


for  i > j 
for  i = j 

for  0<j-i<^m- 
for  j-i>m-s« 


Let  P be  partitioned  in  the  form  [P^  P^] , where  P^  and  P^  are  square. 
T = (t^)  denote  the  Toeplitz  matrix  of  order  N - s defined  by 


Let 


t.  . = h.  . . 
13  1-3 


-1 


In  other  words,  T is  D under  assumption  (iii)  of  Section  6. 

Lemma  9.2.  -T  is  nonsingular  and  equal  to  D * if  D is  completed  by  assigning 


°11  = CP1  P1  ' 


together  with  a corresponding  assignment  of  the  square  submatrix  of  order  m - s in  the 
lower  right  corner  of  D . 

Proof . Note  that  if  we  try  to  form  the  product  DT,  all  elements  of  the  product 

that  do  not  involve  the  missing  elements  in  the  corners  of  D have  the  correct  values 
( 0 or  1)  . We  shall  focus  on  the  upper  left  corner;  similar  considerations  apply  to  the 
lower  right  corner.  The  lemma  will  be  proved  if  it  can  be  shown  that  the  product  DT  is 
indeed  the  identity  if  (and  its  counterpart  at  the  lower  right)  is  chosen  in  the 

manner  indicated. 

let  D12  denote  the  square  submatrix  of  D of  order  m - s immediately  to  the 


j 

j 
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right  of  D^,  let  T^  be  the  submatrix  corresponding  to  in  the  upper  left  corner 

of  T,  and  the  one  immediately  to  its  right.  By  symmetry  the  square  submatrix  of 

T 

order  m - s inmediately  below  T is  T^2  . The  product  DT  will  be  the  identity  if 


is  such  that 


D T + D T *=  I 
11  n 12  12 


(and  if  a similar  relation  holds  in  the  lower  right  corner) . 


It  is  easily  verified  that 


T 

D e cP  P 
12  1 2 


Let  D, . = (d. .)  be  a square  matrix  of  order  m - s defined  by 
11  lj 

*ij  = qi-i  * 

The  reader  will  note  that  replacement  of  by  (and  a corresponding  replacement 

in  the  lower  right  corner)  would  make  D a Toeplitz  matrix.  It  follows  from  the  definition 
of  h(z)  in  (6. 2)  and  the  Toeplitz  character  of  the  matrices  involved  that 

DL>  T12  + °11  Tn  + °12  TL  = 1 ' (9-6) 


(The  reader  may  think  of  the  block  immediately  below  as  moved  up  to  the  left  of  , 

and  the  block  T^  moved  to  a position  immediately  above  T^  .)  It  is  clear  from  (9.6) 
that  (9.4)  will  be  satisfied  if 


n T +D  T =D  T 
12  12  11  11  11  11 


It  is  easily  verified  that 


- T T 

D *=  [p  P 
11  1 2 1 


i-ca 


T T 

= p p + p p 
2 2 1 1 


Substitution  of  this  result  and  (9.5)  in  the  left  member  of  (9.7)  gives 


Px  T12  ♦ P l P2  Tu  + p l Px  Tu)  . 


P T + P T = P 
1 ll2  * 2 11 


m •> 

T12 

_T11_ 
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and  (9.7)  is  satisfied  if  D. 


by  Lemma  9.1.  Thus  (9.8)  reduces  to  cP^^  T^, 
This  completes  the  proof. 


11  “ CP1  P1 


Let  L denote  the  m by  N suhmatrix  of  I - G = K DK  consisting  of  the  first  m 
rows,  an d let  A = (a^)  be  the  m by  N matrix  defined  by 


r 0 for  i > j 

1 for  i * j 

ij  ~ \ -a^  ^ for  0 < j - i <_  m 

I 0 for  j - i > m , 


where  the  coefficients  a^  were  defined  in  (3.4).  Let  A^  denote  the  square  submatrix 
of  A consisting  of  the  first  m columns. 

Lemma  9.3. 


L = cAj^  A . 


(9.10) 


Proof.  Let  denote  the  submatrix  of  D consisting  of  the  first  m rows,  and 

let  K denote  the  square  submatrix  of  order  m in  the  upper  left  corner  of  K . Then 

T 

it  follows  from  the  placement  of  zeros  in  K that 

L=K^DlK  . 

Let  P denote  an  "ra  by  N - s matrix  with  the  elements  defined  as  in  P (following  the 
proof  of  Lemma  9.1).  It  is  easily  verified  that 

A = PK  . 

Let  P^  denote  the  square  submatrix  of  P consisting  of  the  first  m columns.  Then 

T T *T 

Ai  Kn  pi  • 


cA^  A *=  cK^  £*  PK  . 

-T  * 

But  it  follows  from  the  proof  of  Lemma  9.2  that  cP^  P = D^,  and  so 


T T 

CA1  A = Ku  Di  K = L 


as  required  for  the  proof  of  the  lemma. 
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Theorem  9.1.  The  extension  method  of  Section  3 and  the  matrix  formulation  of  Section 
6 are  equivalent. 

Proof.  Let  denote  the  submatrix  of  A consisting  of  the  (m  + l)th  to  (2m) th 

columns  and  let 

A = (A  A?)  . 


Let  y 


(0) 


denote  the  vector  of  the  m intermediate  values  obtained  from  the  observations 


by  (3.5),  let  y ^ and  y'2\  respectively,  denote  the  vectors  of  the  first  m obser- 
vations and  the  (m  + l)th  to  (2m) th  observations,  and  let  y denote  the  vector  con- 
sisting of  y ^ followed  by  y^  . Then,  the  extension  method  requires  Ay  = 0,  or 


A1  y 


(0) 


= -a2  y 


(l) 


(9.11) 


Let  G = (g^j)  be  the  square  matrix  of  order  m defined  by  g_  = c^  (where  the 
coefficients  c^  were  defined  in  (1.2)),  and  let  G^2  be  the  submatrix  of  G formed  from 
the  first  m rows  and  the  (m  + l)th  to  (2m)th  columns.  Then  the  vector  of  the  first 
m graduated  values  from  the  matrix  formulation  is,  by  Lemma  9.3, 

..(1) 


caJ(Ai  y(1)  * a2  y<2>)  . 


By  the  extension  method,  the  corresponding  vector  is 


T (0)  - (1) 

y + G y + g12  y 


(2) 


12 


(9.12) 


(9.13) 


But,  since  G = 1 - K DK  , 


"*»*  * [*;]  ■ 


I - c[aJ*  Ax  + A2  A21 


and 


T 

G s — c A A 
12  CA1  A2 


Thus,  (9.13)  reduces  to 


(1)  , T (0)  T (1)  T (1)  T (2) 

y - c[A2  Ax  y + Aj  A]l  y ' + A2  A2  yl  ' + Ax  A2  yl  ] 


The  substitution  (9.11)  reduces  this  to  (9.12),  as  required. 
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We  note  in  passing  that  the  computational  short  cut  involving  extended  values  has  an 
analogue  in  the  case  of  Whittaker  smoothing.  Especially  in  actuarial  literature,  the 
Whittaker  smoothing  process  is  sometimes  called  the  difference-equation  method  because  the 
difference  equation 

ux  + (-l)s  «2s  ux  - yx  (9.14) 

holds  for  x = A + s,  A + s +■  1,  B - s . It  was  pointed  out  by  Aitken  (1926)  that 

(9.14)  is  satisfied  for  x = A,  A + 1 B if  we  introduce  at  each  end  of  the  data  set 

s extrapolated  values  of  both  y^  and  u^  satisfying  the  conditions 

u *=  y (x=A-j,  x=B+j;  j = 1,  2,  ...»  s)  , 

X X 

Asu*=0  (x  = A-  j,x  = B-  j;  j = 1,  2,  . . . , s)  . 

x 

However,  this  observation  is  not  helpful  from  a computational  point  of  view.  The  attempt 
to  utilize  it  merely  increases  the  order  of  the  linear  system  to  be  solved  from  N to 
N + 2s  . 

10.  THE  CHARACTERISTIC  FUNCTION  AND  SCHOENBERG'S 
DEFINITION  OF  A SMOOTHING  FORMULA 

Schoenberg  (1946)  defined  the  characteristic  function  of  the  MWA  (1.2)  as 

♦ <t)  ■=  l c.  eijt  . dO.l) 

j>--m  3 

Foi  a symmetrical  MWA  this  is  a real  function  of  the  real  variable  t,  and  can  be  ex- 
pressed in  the  alternative  form 

m 

♦(t)  - £ c.  cos  jt  . 

j=-m  3 

It  is  periodic  with  period  2m  and  equal  to  unity  for  t = 2itn  for  all  integers  n . 

The  effect  of  MWA's  in  eliminating  or  reducing  certain  waves  has  been  noted 
(Elphinstone  1951,  Hannan  1970).  If  the  input  to  the  smoothing  process  is  a sine  wave, 

which  may  be  represented  in  the  form 

y *=  C cos(rx  + h)  , (10.2) 

x 
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it  can  be  shown  by  simple  algebraic  manipulation  that 


u = y $ (2ir/P)  , 

x x 

where  p » 2x/r  is  the  period  of  . Thus,  if  <J>  (2it/P)  = 0,  the  wave  is  annihilated 

by  the  smoothing  process;  the  amplitude  is  severely  reduced  if  it  is  close  to  zero.  Thus 
MWA  smoothing  is  related  to  the  "filtering*  processes  considered  by  Wiener  (1949)  and 

others. 

Schoenberg  (1946)  defined  a smoothing  formula  as  an  MWA  whose  characteristic  function 
♦ (t)  satisfies  the  condition 

|*(t)  | <_  1 (10.4) 

for  all  t . Later  (Schoenberg  1948,  1953)  he  suggested  the  stronger  condition 

I ♦ (t)  | < 1 (0  < t < 2ji)  . (10.5) 

C.  Lanczos  (see  Schoenberg  1953)  pointed  out  that  condition  (10.4)  is  obtained  by  requiring 
that  every  simple  vibration  (10.2)  be  diminished  in  amplitude  by  the  transformation  (1.2). 
The  results  of  Section  6 of  the  present  paper  suggest  an  alternative  definition  of  a smooth- 
ing formula.  Using  the  subscript  N to  emphasize  the  fact  that  the  order  of  G is  the 
number  of  observations  in  the  data  set,  we  may  say  that  (1.2)  is  a smoothing  formula  if 

g“  = lim  g"  (10.6) 

* N N 

n+*> 

exists  for  all  N.  Schoenberg  (1953,  footnote  3)  suggested  a relationship  between  (10.4) 
and  the  conditions  for  existence  of  the  infinite  power  of  a matrix  (Oldenburger  1940, 

Dresden  1942) , but  he  did  not  elaborate  the  connection.  We  shall  show  that  the  existence 
of  the  limit  (10.6)  for  all  N is  equivalent  to  a conditioh  intermediate  between  (10.4) 
and  (10.5).  The  following  lemma  will  help  to  elucidate  the  situation. 

Lemma  10.1.  For  a given  x in  (0,  2w)  , <f>(x)  =1  if  and  only  if  q(x)  = 0 . 

Proof.  From  (3.1),  (3.2),  and  (10.1)  it  follows  that 

$(t)  = 1 ' (~1)S  (2i  sin  jt)2s  q(eit:)  = 1 - (4sin2  “t)S  q(eitL)  . (10.7) 

Since  sin  ^ 0,  the  lemma  is  established. 
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r 


There  are  two  ways  in  which  equality  can  hold  in  (10.4),  namely  $(t)  « 1 and 
^(t.)  « -1,  and  the  situation  is  different  in  the  two  cases.  Lemma  10.1  shows  that  if 
$(t)  • 1,  q(z)  has  a zero  on  the  unit  circle  and  consequently  is  not  defined.  No 

such  problem  arises  if  $(t)  = -1  . We  are  therefore  led  to  the  intermediate  condition 

-1  £ ♦(t)  <1  (0  < t < 2ir)  , (10.8) 

which  we  shall  show  to  be  equivalent  to  the  existence  of  (10.6). 

Lemma  10.2.  If  (10.8)  holds,  D is  positive  definite. 

Proof.  From  (9.2)  we  obtain 

q(l)  *=  c(p(l)  1 2 . 

It  follows  from  (10.7)  and  (10.8)  that  q(eXt)  is  positive  for  0 < t < 2it  . Since  it  is 
a continuous  function  of  t,  it  is  nonnegative  for  t = 0:  that  is,  q(l)  is  nonnega- 
tive. By  the  definition  of  p(z),  p(l)  ? 0,  and  c = -<3m_s/Pln_s  does  not  vanish. 
Therefore  q(l)  is  positive  and  c is  positive. 

Let  the  expansion  of  b(z)  of  Section  9 be  given  by 


b(z)  = l b z'3 
j=m-s  J 


(10.9) 


It  follows  from  (9.3)  that  on  the  unit  circle 

-1 


b(z)  b(z  ) *=  ch(z) 


(10.10) 


Substitution  of  (10.9)  gives 


£ bi  bf.+j 


i=m-s 


ch . 


(j  — ...,  —1#  0,  1,  •••) 


(10.11) 


Ne  note  that  the  coefficients  b..  satisfy  the  difference  equation 


b. 


(j=m-s+l,  m-s+2,  ...) 


If  r 


1'  V •“ 


, r 


m-s 


■j  • l -■  bi-« 

are  the  zeros  of  p(z) , it  follows  that 


b - l a.  r 
J k=l 


j 

k ‘k 
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w 


for  some  constant  coefficients  ci  . Therefore  the  series  in  the  left  member  of  (10.11) 
is  absolutely  convergent. 

Let  B = (b  ) denote  the  matrix  of  N - s rows  and  a denumerable  infinity  of 
columns  given  by 


(i  > j) 


b . . . (i  < j)  . 
m-s+j-i  — 


(10.12) 


It  follows  from  (10.11)  and  (6.1)  that 


-1  T 

D - cBB 


(10.13) 


The  structure  of  the  right  member  of  (10.13)  shows  that  d" 1 is  nonnegative  definite; 
since  it  is  nonsingular,  it  is  positive  definite,  and  consequently  D is  positive  definite, 
as  required. 


Let  ifi(t)  be  defined  by 


Then  (10.8)  is  equivalent  to 


*(t)  = 1 - 4>  (t) 


0 < i|i(t)  £2  (0  < t < 2h)  . 


(10.14) 


let  l('max  denote  the  maximum  value  of  ^(t),  and  let  A = (a^)  be  the  square  matrix 
of  order  N defined  by  (9.9).  Let  )v|  denote  the  Euclidean  norm  of  a vector  v . 

lemma  10.3.  If  v is  any  vector  of  N real  components  and  iji(t)  is  positive  in 
(0.  2w)  , 


|av|2/|v|2  1 c-1  III 


(10.15) 


Proof ■ Let  v be  an  arbitrary  vector  of  N real  components,  with  jth  component 


v.  , and  let 
3 


V(t)  = l 


(10.16) 


be  the  characteristic  function  of  v . Then,  if  a(z)  is  the  polynomial  defined  by  (3.4) 

ft  N+m  *•». 

a(e  ) V(t)  * J w.  e13  , 


r 


where,  for  j >11,  w^  is  the  (j  - m)th  component  of  Av  . Moreover,  it  follows  from 
(3.4),  (9.2),  and  (10,7)  that 


*(t)  « ca(eifc)  a (e  ifc)  . 


By  Parseval's  formula  (Schoenberg  1946) 


2ir 


|v|2-^  /o  |v(t)|2dt  . 


Similarly,  in  view  of  (10.17), 


2ir 


Avl  / I « (eXt)  | 2 | V(t)  | 2 dt 


From  (10.18)  and  (10.19),  (10.15)  follows. 


-1  2if 

/ *(t)  iv(t>  r dt 


-T  - 

It  is  easily  verified  that  the  symmetric  matrix  A A is  indentical  with  F = I - G 

except  for  the  elements  of  the  square  submatrix  in  the  lower  right  corner.  In  fact,  the 

-T  * 

elements  in  the  lower  right  corner  of  A A are  such  that  the  entire  matrix  becomes  a 

Toeplitz  matrix  if  the  first  m rows  and  the  first  m columns  are  deleted.  It  follows 

-X  - 

that  F can  be  obtained  from  cA  A by  subtracting  a square  matrix  of  order  N whose 
elements  are  all  zero  except  for  the  square  submatrix  of  order  m in  the  lower  right 
corner. 

In  fact,  if  A^  and  A^  are  defined  as  in  the  proof  of  Theorem  9.1,  it  is  easily 

-T  - 

verified  that  the  square  submatrix  of  order  m in  the  lower  right  corner  of  cA  A is 


(10.17) 


(10.18) 


(10.19) 


given  by 


:IA2  Al]  [VJ  = C(A2  A2  + A1  V • 


while  the  corresponding  submatrix  of  F is  cA*  A^  . If,  therefore,  A is  defined  as 
the  square  matrix  of  order  N having  A^  in  the  lower  right  corner  and  zeros  everywhere 
else,  we  have 

Lemma  10.4. 

F-I-G«c(ATA-ATA)  . 

Before  stating  the  theorem  that  is  the  main  result  of  this  section,  we  point  out 

(Oldenbcrger  1940,  Dresden  1942)  that,  for  a given  matrix  C,  lim  C°  exists  if  and  only 

n->  “ 

-41- 


sJ. 


if  either  all  eigenvalues  of  C lie  within  the  unit  circle,  or  else  1 is  a simple  zero 
of  the  minimum  polynomial  of  C and  all  other  eigenvalues  lie  within  the  unit  circle.  As 
■multiplication  by  G leaves  unchanged  vectors  whose  components  are  successive  equally 
spaced  ordinates  of  a polynomial  of  degree  s - 1 or  less,  it  is  clear  that  1 is  an 
eigenvalue.  As  G is  symmetric,  all  its  eigenvalues  are  real,  and  all  zeros  of  its  mini- 
mum polynomial  (including  1)  are  simple.  Therefore  the  limit  (10.6)  exists  if  and  only  if 
all  eigenvalues  of  C ocher  than  1 are  strictly  between  -1  and  1 . 

Theorem  10.1.  The  limit  g”  exists  for  all  N if  and  only  if  the  characteristic 
function  $(t)  satisfies  the  condition 

-1  < $<t)  <1  (0  < t < 2n)  . (10.8) 


Proof.  Let  (10.8)  hold,  and  recall  that  (10.8)  is  equivalent  to  (10.14).  Now  we 
shall  consider  a particular  value  of  N and,  for  convenience,  drop  the  subscript  N . 

The  eigenvalues  of  F are  obtained  by  subtracting  from  1 those  of  G . We  need  to  show, 

therefore,  that  the  nonzero  eigenvalues  of  F are  positive  and  do  not  exceed  2. 

T 

Since  F = K DK  and  D is  positive  definite  by  Lemma  10.2,  F is  nonnegative  def- 
inite. Therefore  its  nonzero  eigenvalues  are  positive.  Let  v be  an  arbitrary  nonzero 
real  vector  and  consider  the  Rayleigh  quotient. 


T T ~T  - T — T — 

v Fv  v A Av  „ w A Aw 
— c — * e 


T 

V V 


T 

V V 


T 

V V 


by  Lemma  10.4.  Since  A A is  nonnegative  definite,  we  have 

r < c|av|2/|v|2  £ *max 

by  Lenina  10.3.  Therefore  (10.14)  gives  r < 2 . Since  the  spectral  radius  is  the  maximum 
value  of  the  Rayleigh  quotient,  this  completes  the  first  part  of  the  proof. 

We  shall  prove  the  converse  by  showing  that  if  (10.8)  fails,  the  limit  (10.6)  does 
not  exist  for  some  N . There  are  two  ways  in  which  (10.8)  can  fail.  Either  $(t)  may 
be  equal  to  1,  or  |$(t)|  may  exceed  1 . In  the  former  case,  as  was  pointed  out  im- 
mediately following  Lemma  10.1,  G is  not  defined. 
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1 ' — "T. 


He  consider  therefore  the  case  in  which  j 4 (t:)  | >1  for  sane  t « t,  and  let  v be 
the  N- vector  whose  jth  component  is 


,,  .A  N +■  1.  , 
v.  » explix  ( j - — )]  . 


(10.20) 


Using  an  asterisk  to  denote  the  conjugate  transpose,  we  have  v^  v^  “ 1,  and  therefore 


v v - N . 


(10,21) 


Except  for  the  first  m and  the  last  ir  components  we  have 


CGv)  j - ♦ (x)  v^  , 


and  so 


v Gv  *=  ♦ (T ) (N  - 2m)  + C , 


(10.22) 


where  C denotes  the  contribution  of  the  first  m and  the  last  m components.  Because 
of  the  symmetry  of  both  the  matrix  elements  and  the  vector  components,  C is  real.  Since 
all  the  vector  components  have  absolute  value  1,  an  upper  bound  to  C is  the  sum  of  the 
absolute  values  of  the  elements  in  the  first  m and  the  last  m rows  of  G . Call  this 
Cj  . We  recall  that  does  not  depend  on  N . 

Now  choose  N sufficiently  large  so  that 

C1  + 2m|*(t)  | 

N > " U(T)  | - 1 • 


and  it  follows  that 


Consequently, 


N[|Mt)  | - 1]  > Cx  + 2m|*(T)  | , 


(N  - 2m)  | <J>  (t)  | > N + | C | . 


| (N  - 2m)  $(T)  + C|  > N , 


and  therefore,  by  (10.21)  and  (10.22), 


M - V*  > i • 

I v v 1 

It  follows  that  the  spectral  radius  of  G is  greater  than  unity,  and  the  proof  is  complete. 
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It  is  easily  verified  that  G , when  it  exists,  is  the  orthogonal  projector  on  the 
eigenspace  of  G associated  with  the  eigenvalue  1,  that  is,  the  space  of  N-vectors 
whose  components  are  successive  equally  spaced  ordinates  of  polynomials  of  degree  s - 1 
or  less. 

11.  SMOOTHING  FORMULAS  IN  THE  STRICT  SENSE  AND  AN  OPTIMAL  PROPERTY 


At  an  early  stage  of  the  investigations  underlying  this  paper  I was  trying  to  explain 
the  natural  extension  of  the  MWA  graduation  to  my  colleague,  I.  J.  Schoenberg,  whose  work 
plays  such  an  important  role  therein,  and  he  asked  me  (I  thought  with  a slight  show  of 
impatience)  "What  does  it  minimize?”  My  answer  was  that  it  doesn't  minimize  anything,  but 
is  just  a natural  way  of  extending  the  MWA  graduation  to  the  ends  of  the  data.  This  was 
too  simplistic  an  answer,  for  we  shall  now  show  that  it  does  in  fact  minimize  "something." 

In  a slightly  more  general  form  of  the  Whittaker  smoothing  method  (Greville  1957)  one 
minimizes  the  sum  of  the  squares  of  the  departures  of  the  smoothed  values  from  the  ob- 
served values  plus  a designated  positive  definite  quadratic  form  in  the  sth  differences 
of  the  smoothed  values.  In  other  words,  one  minimizes 


(u  - y)T  (u  - y)  + (Ku)T  HKu  , 


where  H is  a given  positive  definite  matrix  of  order  N - m , Minimization  of  this  ex- 


pression leads  to  the  equation 


(I  + K HK)u  = y , 

. t 

which  has  a unique  solution  for  u since  I + K HK  is  positive  definite.  I showed 

(Greville  1957)  that  this  graduation  method  has  the  interesting  property  that  if  roughness 

T 

(opposite  of  smoothness)  is  measured  by  the  term  (Ku)  HKu,  smoothness  is  always  in- 
creased by  the  graduation.  By  Theorem  5.22  of  Noble  (1969), 

T -1  T _i  T -1 

(I  + K HK)  = I - K (H  + KK  ) K . 

The  last  expression  is  of  the  form  (5.6)  and  suggests  that  the  use  of  am  MWA  with  the 

natural  extension  might  be  regarded  as  a generalized  Whittaker  smoothing  process  if 

-1  T -1 

D = (H  + KK) 
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Solving  for  H gives 


-1  T -1 

H = (D  - KK  ) 


(11.1) 


We  are  led  to  inquire,  therefore,  under  what  conditions  an  MWA  is  such  that  the  right 
member  of  (11.1)  is  positive  definite.  Clearly  H is  positive  definite  if  and  only  if 


the  Toeplitz  matrix 


is  positive  definite. 


-1  -1  T 

H ■=  D - KK 


(11.2) 


Schoenberg  (1946,  p.  53)  remarks  that  it  is  desirable  for  an  efficient  smoothing  for- 
mula, one  that  achieves  adequate  smoothness  without  producing  unnecessarily  large  departures 
from  the  observed  values,  to  have  its  characteristic  function  satisfy  the  stronger  con- 


dition 


0 < 4>  (t)  <_  1 , 


This  remark  seems  to  have  been  little  noted  in  the  years  since  its  publication.  We  shall 


call  an  MWA  a smoothing  formula  in  the  strict  sense  if  its  characteristic  function  satisfies 


the  condition 


0 < <J>  ( t)  < 1 


(0  < t < 2tt)  , 


(11.3) 


and  we  shall  show  that  (11.2)  is  positive  definite  for  all  N if  and  only  if  (11.3)  holds. 

-1  T 

Theorem  11.1.  Let  (10.8)  hold.  Then  Q = D - KK  is  positive  definite  for  all 
N if  and  only  if  the  MWA  is  a smoothing  formula  in  the  strict  sense. 

Proof.  Let  (11.3)  hold,  let  v be  an  arbitrary  nonzero  real  N-vector,  and  consider 


the  Rayleigh  quotient, 


= <cIbt  v|2  - |kt  v | 2) / 1 v | 2 , 

V V 


(11.4) 


where  B is  given  by  (10.12).  Let  V(t)  be  defined  by  (10,16),  Then,  by  Parseval's 


formula. 


|BT  v|2  = I |e-<”-s)it  b(e'it)|2  |v(t, |2  dt 


-1  2tt 

c r i.  , it. 


V/  iMe  ) I I V ( t ) | dt 


(11.5) 


r 


by  (10.10).  Moreover,  again  by  Parseval's  formula, 


|kT  v|2  « — f (4sin2  |t)S  |v(t)|2  dt  . 


(11.6) 


It  was  shown  in  the  proof  of  Lemma  10.2  that  q(e  ) is  positive,  and  therefore 
h(e*fc)  *=  [q(eit)J  1 is  positive,  for  0 < t < 2x  , By  means  of  (11.5),  (11.6),  and 
(10.7),  (10.4)  gives 

r - ~ } Ih(eit)  - (4sin2  ^t)S]  |v(t)  | 2 dt 

. 2ir 

- S h(e  C)  $(t)  |v(t)T  dt  > 0 , 

0 

since  any  zeros  of  $(t)  constitute  a set  of  measure  zero.  Since  r is  positive  for 
arbitrary  nonzero  v,  Q is  positive  definite. 

To  prove  the  converse  we  shall  suppose  that  (11.3)  does  not  hold  and  show  that  r is 
negative  for  some  N and  some  v . Because  of  the  hypothesis  that  (10.8)  holds,  <t>  (t)  is 
less  than  1 . We  suppose,  therefore,  that  $(t)  <0  for  some  t * x in  (0,  2tt)  . Let 
v be  given  by  (10.20).  By  an  argument  similar  to  that  used  in  the  proof  of  Lemma  10.2, 
it  is  easily  shown  that  the  series  (6.2)  for  h(z)  is  absolutely  convergent  for  z = 1 . 
Thus,  for  any  small  positive  quantity  e,  there  exists  a positive  integer  M such  that 

l IhJ  < J«  • 

j=M+l  3 


Thus,  for  N > 2M,  the  jth  component  of  Qv,  for  j «*  M + 1,  K + 2,  . . . , N - M,  is  v 


multiplied  by  a quantity  less  than 


h(eiT)  + e - (4sin2  jt)s  . 


(11.7) 


By  (10.7). 


h(eiT)  - (4sin2  = h(eiT)  $(x) 


(11.8) 


It  was  shown  in  the  proof  of  Lemma  10.2  that  (10.8)  implies  h(e  T)  > 0 . Thus,  (11.8)  is 
negative.  Choose  e sufficiently  small  so  that  (11.7)  is  negative. 

As  in  the  proof  of  Theorem  10.1,  we  find  that  v v = N,  while 


» 


(11.9) 


v*  Qv  £ (N  - 2M)  Ih(e1T)  *(t)  + e ] + C , 

where  C denotes  the  contribution  of  the  first  M and  the  last  M rows.  As  in  the  proof 
of  Theorem  10.1,  it  follows  from  symmetry  that  C is  real.  Let 

n - l \K  - <-i>s'j  A I . 

j=-  00  J 5 J 

^ 2s 

where  ( ^ ) is  understood  to  vanish  for  negative  j or  j > 2s  • Then 

c < |c|  < 2Mn  , 


since  all  components  of  v have  absolute  value  unity,  now  take 


N > 


2M[h(e 


it 


h(e 


) $(t)  + 
) <J>  (T)  + 


e 


e 


nJ 


(11.10) 


Note  that  both  numerator  and  denominator  of  the  right  member  of  (11.10)  are  negative. 

From  (11.10)  we  obtain 

(N  - 2M)  [h(e1T)  <f,(T)  + « 1 < -2Ki  , 

* 

and  (11.9)  then  gives  v Qv  < 0,  so  that  Q is  not  positive  definite.  This  completes 
the  proof  of  the  theorem. 

It  is  easy  to  construct  an  MWA  that  is  a smoothing  formula  in  the  strict  sense.  A 
trivial  example  is  the  formula 

Ux  “ T?  ('yx-2  + 4yx-l  + llyx  + 4yx+l  ' yx+2>  • 

However,  none  of  the  weighted  averages  in  general  use  fall  in  this  class.  In  particular, 
using  the  properties  of  Jacobi  polynomials,  I have  shown  elsewhere  (Greville  1966)  that  the 
characteristic  functions  of  all  rainimum-Rs  averages  assume  negative  values  in  (0,  2it)  . 
Thus  no  such  formula  is  a smoothing  formula  in  the  strict  sense. 

There  is,  however,  one  family  of  moving  averages,  mentioned  in  the  literature  but  not 
in  general  use,  that  are  smoothing  formulas  in  the  strict  sense.  Elsewhere  (Greville  1966) 
I have  considered  the  limiting  case  of  the  minimum-R^  formulas  as  s tends  to  infinity. 

In  finite-difference  form,  the  minimum-R^  MWA  of  2m  + 1 terms,  exact  for  the  degree 


2s  - 1,  is 
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2 (m-s+1) 


2 (m-s+1)  % . -j  m-s+j  .2j 

u “ M L <-4)  ( ^ ) 5 y,  . 

x j=0  ->  x 


where  the  operator  y is  defined  by 


yf(x)  - y[f(x  + ~)  + f(x  - j)]  , 


2 12 

so  that  u * 1 + -76  . The  characteristic  function  is 

4 

1 .m-s+1  SrX  , m-s+j,  . 2j  1 


. ...  I 1., m-s+l  r m-s+3, 

♦ (t)  - (cos  -rt)  l ( ) 

3=0 


i ' 


which  is  nonnegative,  with  a single  zero  of  order  m-s  + 1 at  t = tt  . 

By  a tour  de  force  it  is  possible  to  show  that,  for  an  MWA  that  is  not  a smoothing 
formula  in  the  strict  sense,  but  whose  characteristic  function  satisfies  (10.8)  , the  natural 
extension  does  nevertheless  "minimize  something."  For  the  given  MWA,  let  -p  denote  the 
minimum  value  of  <P  (t)  , and  let  y be  chosen  so  that  0 < y ^ (1  + p)  * . Then 
1 - y(l  + p)  >_  0 . Let  a modified  MWA,  u^  be  obtained  by  taking 

“x  = yUx  + (X  r Y>yx  • (11.11) 


Clearly  this  is  an  MWA  of  the  form  (1.2), 


m 

u = 

l 

c.  y . , 

3 x-3 

X 

j=-m 

with  = yCq  + 1 - y and  Cj  = Ycj 

for 

j / 0 . The  modified  MWA  is  a smoothing  for- 

T - 

mula  in  the  strict  sense,  and  its  graduation  matrix  is  G «*  I - K DK,  with  D = yD  . 

The  modified  graduation  minimizes  the  quantity 

(u  - y)T  (u  - y)  + (Ku)T  HKu  , (11.12) 

where 

H = (D_1  - KKV1  (11.13) 

is  positive  definite.  Using  (11.11)  and  (11.13)  to  express  (11.12)  in  terms  of  the  original 
graduated  values,  we  find  that  the  quantity  minimized  is 

<u  - y)T  (u  - y)  + (u  + (y_1  - 1) y]T  KT  Q_1  K[u  + (y_1  - l)y)  . (11.14) 

where 
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L .. 


— - ■ - • 


A 'I  "I 

Q - y 0 


KK 


is  positive  definite.  Thus,  the  total  smoothing  operation  including  the  "tails,"  based  on 
an  MWA  that  is  a smoothing  formula,  but  not  in  the  strict  sense,  does  in  fact  minimize  the 
expression  (11.14).  Using  statistical  terminology,  this  expression  may  therefore  be  re- 
garded as  a "loss  function,"  but  in  that  context  is  difficult  to  interpret  and  justify  in 
practical  terms. 
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